[internal-dns] register and publish ddmd in the switch zone#10381
[internal-dns] register and publish ddmd in the switch zone#10381zeeshanlakhani wants to merge 8 commits into
Conversation
DDMD has always run in the switch zone alongside Dendrite, MGS, and MGD, but it was never registered in internal DNS, leaving no path for a cross-host consumer to discover it. This adds `ServiceName::Ddm`, plumbs `ddm_port` through the host-zone switch (RSS plan + reconfigurator DNS execution), threads an `Overridables::ddm_ports` map for the test suite, and lands a `DdmInstance` dropshot sim in test utils so that the test harness registers a real DDM port in DNS the same way it does for the other switch-zone services. We also drop the duplicate DDMD_PORT const in `ddm-admin-client` in favor of the canonical `omicron_common::address::DDMD_PORT`. Same-host callers continue to use `Client::localhost()`. This was extracted from the multicast PR (zl/multicast-mgd-ddm), which uses ddmd cross-host as the first DNS-resolved consumer, as Nexus is the consumer.
jgallagher
left a comment
There was a problem hiding this comment.
Thank you very much for splitting this out!
Omicron's oxidecomputer/omicron#10381 introduces a stubbed `ddmd` admin endpoint because spawning a real `ddmd` in a generic test toolchain is not viable: the routing state machine (discovery, exchange, route synchronization) depends on illumos networking facilities the toolchain does not provide. Consumers of the stub, e.g., Nexus RPW (multicast members), sled-agent's DDM reconciler, and anything that resolves the DDM internal-DNS service name, cannot exercise the real admin surface from Omicron's test harness. This work adds an opt-in `--no-state-machine` flag to `ddmd` that runs only the admin API server and skips the state machine entirely, allowing the fixture to spawn the real binary. This is analogous to `mgd --no-bgp-dispatcher`, which Omicron's `MgdInstance` already uses for the same purpose. To make the fixture path usable on Linux, `ddmd` itself must build on Linux. The previous code pulled the illumos-only crates `libnet`, `dpd-client`, `opte-ioctl`, and `oxide-vpc` unconditionally through `ddm`, which failed to link on Linux (`-lzfs`, `-ldlpi`). This change introduces an `illumos` feature in both `ddm` and `ddmd` (default-on, mirroring `mgd`'s `mg-lower` pattern) that marks those four crates optional. The buildomat `linux.sh` job now builds `ddmd` and `ddmadm`, with `ddmd` invoked as `cargo build --bin ddmd --no-default-features`. The illumos-only halves of `ddm` are isolated by the feature gate: - The routing state machine implementation moves from `sm.rs` into `sm/state.rs`. - The exchange runtime (HTTP push/pull and route programming) moves from `exchange.rs` into `exchange/runtime.rs`. - The discovery runtime (UDPv6 solicitation/advertisement loops) moves from `discovery.rs` into `discovery/runtime.rs`. Each parent `mod.rs` keeps the platform-agnostic types and re-exports the runtime surface so existing call sites resolve unchanged on illumos. The runtime submodules are gated as a unit by `#[cfg(all(feature = "illumos", target_os = "illumos"))]`. We also remove the single-function `ddm/src/util.rs`, inlining the function into `discovery/runtime.rs`, where its sole caller lives. The SIGTERM cleanup handler is installed regardless of the flag, so Ctrl-C still exits cleanly in `--no-state-machine` mode. The imported route sets are empty in that mode, so the cleanup itself is a noop. Passing `--addr` alongside `--no-state-machine` is harmless but ignored, with a warning logged.
Omicron's oxidecomputer/omicron#10381 introduces a stubbed `ddmd` admin endpoint because spawning a real `ddmd` in a generic test toolchain is not viable: the routing state machine (discovery, exchange, route synchronization) depends on illumos networking facilities the toolchain does not provide. Consumers of the stub, e.g., Nexus RPW (multicast members), sled-agent's DDM reconciler, and anything that resolves the DDM internal-DNS service name, cannot exercise the real admin surface from Omicron's test harness. This work adds an opt-in `--no-state-machine` flag to `ddmd` that runs only the admin API server and skips the state machine entirely, allowing the fixture to spawn the real binary. This is analogous to `mgd --no-bgp-dispatcher`, which Omicron's `MgdInstance` already uses for the same purpose. To make the fixture path usable on Linux, `ddmd` itself must build on Linux. The previous code pulled the illumos-only crates `libnet`, `dpd-client`, `opte-ioctl`, and `oxide-vpc` unconditionally through `ddm`, which failed to link on Linux (`-lzfs`, `-ldlpi`). This change introduces an `illumos` feature in both `ddm` and `ddmd` (default-on, mirroring `mgd`'s `mg-lower` pattern) that marks those four crates optional. The buildomat `linux.sh` job now builds `ddmd` and `ddmadm`, with `ddmd` invoked as `cargo build --bin ddmd --no-default-features`. The illumos-only halves of `ddm` are isolated by the feature gate: - The routing state machine implementation moves from `sm.rs` into `sm/state.rs`. - The exchange runtime (HTTP push/pull and route programming) moves from `exchange.rs` into `exchange/runtime.rs`. - The discovery runtime (UDPv6 solicitation/advertisement loops) moves from `discovery.rs` into `discovery/runtime.rs`. Each parent `mod.rs` keeps the platform-agnostic types and re-exports the runtime surface so existing call sites resolve unchanged on illumos. The runtime submodules are gated as a unit by `#[cfg(all(feature = "illumos", target_os = "illumos"))]`. We also remove the single-function `ddm/src/util.rs`, inlining the function into `discovery/runtime.rs`, where its sole caller lives. The SIGTERM cleanup handler is installed regardless of the flag, so Ctrl-C still exits cleanly in `--no-state-machine` mode. The imported route sets are empty in that mode, so the cleanup itself is a noop. Passing `--addr` alongside `--no-state-machine` is harmless but ignored, with a warning logged.
…fixture We address @jgallagher's review by: - Replacing the four positional `u16` arguments in `DnsConfigBuilder::host_zone_switch` with a `HostSwitchZonePorts` named-fields structure. - Replacing the dropshot-based stubbed `DdmInstance` in test-utils with a fixture that spawns and supervises a real `ddmd` subprocess running with `--no-state-machine`, analogous to `MgdInstance` and `mgd --no-bgp-dispatcher`. Only the switch-zone `ddmd` is registered in internal DNS, while sled-global-zone instances are accessed locally by their own host and don't need DNS registration. This **does** require maghemite changes, already PR'ed to oxidecomputer/maghemite#729. To make this all work, we wire `ddmd` into the developer xtask toolchain. `cargo xtask download maghemite-ddmd` reuses the existing `mg-ddm.tar.gz` illumos zone artifact (extracting `ddmd`/`ddmadm`). On Linux it overlays a raw `ddmd` binary, and on macOS it builds from source. Also, we had to bump `oxnet` from 0.1.4 to 0.1.5 to satisfy the new maghemite pin.
Omicron's oxidecomputer/omicron#10381 introduces a stubbed `ddmd` admin endpoint because spawning a real `ddmd` in a generic test toolchain is not viable: the routing state machine (discovery, exchange, route synchronization) depends on illumos networking facilities the toolchain does not provide. Consumers of the stub, e.g., Nexus RPW (multicast members), sled-agent's DDM reconciler, and anything that resolves the DDM internal-DNS service name, cannot exercise the real admin surface from Omicron's test harness. This work adds an opt-in `--no-state-machine` flag to `ddmd` that runs only the admin API server and skips the state machine entirely, allowing the fixture to spawn the real binary. This is analogous to `mgd --no-bgp-dispatcher`, which Omicron's `MgdInstance` already uses for the same purpose. To make the fixture path usable on Linux, `ddmd` itself must build on Linux. The previous code pulled the illumos-only crates `libnet`, `dpd-client`, `opte-ioctl`, and `oxide-vpc` unconditionally through `ddm`, which failed to link on Linux (`-lzfs`, `-ldlpi`). This change introduces an `illumos` feature in both `ddm` and `ddmd` (default-on, mirroring `mgd`'s `mg-lower` pattern) that marks those four crates optional. The buildomat `linux.sh` job now builds `ddmd` and `ddmadm`, with `ddmd` invoked as `cargo build --bin ddmd --no-default-features`. The illumos-only halves of `ddm` are isolated by the feature gate: - The routing state machine implementation moves from `sm.rs` into `sm/state.rs`. - The exchange runtime (HTTP push/pull and route programming) moves from `exchange.rs` into `exchange/runtime.rs`. - The discovery runtime (UDPv6 solicitation/advertisement loops) moves from `discovery.rs` into `discovery/runtime.rs`. Each parent `mod.rs` keeps the platform-agnostic types and re-exports the runtime surface so existing call sites resolve unchanged on illumos. The runtime submodules are gated as a unit by `#[cfg(all(feature = "illumos", target_os = "illumos"))]`. We also remove the single-function `ddm/src/util.rs`, inlining the function into `discovery/runtime.rs`, where its sole caller lives. The SIGTERM cleanup handler is installed regardless of the flag, so Ctrl-C still exits cleanly in `--no-state-machine` mode. The imported route sets are empty in that mode, so the cleanup itself is a noop. Passing `--addr` alongside `--no-state-machine` is harmless but ignored, with a warning logged.
e3c6a18 to
9824436
Compare
f69d6d4 to
e212660
Compare
Omicron's oxidecomputer/omicron#10381 introduces a stubbed `ddmd` admin endpoint because spawning a real `ddmd` in a generic test toolchain is not viable: the routing state machine (discovery, exchange, route synchronization) depends on illumos networking facilities the toolchain does not provide. Consumers of the stub, e.g., Nexus RPW (multicast members), sled-agent's DDM reconciler, and anything that resolves the DDM internal-DNS service name, cannot exercise the real admin surface from Omicron's test harness. This work adds an opt-in `--no-state-machine` flag to `ddmd` that runs only the admin API server and skips the state machine entirely, allowing the fixture to spawn the real binary. This is analogous to `mgd --no-bgp-dispatcher`, which Omicron's `MgdInstance` already uses for the same purpose. To make the fixture path usable on Linux, `ddmd` itself must build on Linux. The previous code pulled the illumos-only crates `libnet`, `dpd-client`, `opte-ioctl`, and `oxide-vpc` unconditionally through `ddm`, which failed to link on Linux (`-lzfs`, `-ldlpi`). This change introduces an `illumos` feature in both `ddm` and `ddmd` (default-on, mirroring `mgd`'s `mg-lower` pattern) that marks those four crates optional. The buildomat `linux.sh` job now builds `ddmd` and `ddmadm`, with `ddmd` invoked as `cargo build --bin ddmd --no-default-features`. The illumos-only halves of `ddm` are isolated by the feature gate: - The routing state machine implementation moves from `sm.rs` into `sm/state.rs`. - The exchange runtime (HTTP push/pull and route programming) moves from `exchange.rs` into `exchange/runtime.rs`. - The discovery runtime (UDPv6 solicitation/advertisement loops) moves from `discovery.rs` into `discovery/runtime.rs`. Each parent `mod.rs` keeps the platform-agnostic types and re-exports the runtime surface so existing call sites resolve unchanged on illumos. The runtime submodules are gated as a unit by `#[cfg(all(feature = "illumos", target_os = "illumos"))]`. We also remove the single-function `ddm/src/util.rs`, inlining the function into `discovery/runtime.rs`, where its sole caller lives. The SIGTERM cleanup handler is installed regardless of the flag, so Ctrl-C still exits cleanly in `--no-state-machine` mode. The imported route sets are empty in that mode, so the cleanup itself is a noop. Passing `--addr` alongside `--no-state-machine` is harmless but ignored, with a warning logged.
Picks up recent oxidecomputer/maghemite#729 (ddmd --api-only flag) and the preceding main changes that moved canonical types out of the auto-generated client into the `mg-api-types` crate. Includes: - replaces `rdb-types` (removed upstream) with `mg-api-types` as a direct workspace dep - bumps `num_enum` 0.7.5 -> 0.7.6 to satisfy maghemite's workspace pin - migrates types - renames `bgp_apply_v2` callers to `bgp_apply` - `DdmInstance` fixture is renamed from `--no-state-machine` to `--api-only` to match the new clap flag.
jgallagher
left a comment
There was a problem hiding this comment.
Changes all LGTM - just one comment about PR ordering.
Omicron's oxidecomputer/omicron#10381 introduces a stubbed `ddmd` admin endpoint because spawning a real `ddmd` in a generic test toolchain is not viable: the routing state machine (discovery, exchange, route synchronization) depends on illumos networking facilities the toolchain does not provide. Consumers of the stub, e.g., Nexus RPW (multicast members), sled-agent's DDM reconciler, and anything that resolves the DDM internal-DNS service name, cannot exercise the real admin surface from Omicron's test harness. This work adds an opt-in `--api-only` flag to `ddmd` that runs only the admin API server and skips the state machine entirely, allowing the fixture to spawn the real binary. This is analogous to `mgd --no-bgp-dispatcher`, which Omicron's `MgdInstance` already uses for the same purpose. To make the fixture path usable on Linux, `ddmd` itself must build on Linux. The previous code pulled the illumos-only crates `libnet`, `dpd-client`, `opte-ioctl`, and `oxide-vpc` unconditionally through `ddm`, which failed to link on Linux (`-lzfs`, `-ldlpi`). This change introduces a `backend` feature in both `ddm` and `ddmd` (default-on, mirroring `mgd`'s `mg-lower` pattern) that marks those four crates optional. The buildomat `linux.sh` job now builds `ddmd` and `ddmadm`, with `ddmd` invoked as `cargo build --bin ddmd --no-default-features`. The illumos-only halves of `ddm` are isolated by the feature gate: - The routing state machine implementation moves from `sm.rs` into `sm/state.rs`. - The exchange runtime (HTTP push/pull and route programming) moves from `exchange.rs` into `exchange/runtime.rs`. - The discovery runtime (UDPv6 solicitation/advertisement loops) moves from `discovery.rs` into `discovery/runtime.rs`. Each parent `mod.rs` keeps the platform-agnostic types and re-exports the runtime surface so existing call sites resolve unchanged on illumos. The runtime submodules are gated as a unit by `#[cfg(all(feature = "backend", target_os = "illumos"))]`. We also remove the single-function `ddm/src/util.rs`, inlining the function into `discovery/runtime.rs`, where its sole caller lives. The SIGTERM cleanup handler is installed regardless of the flag, so Ctrl-C still exits cleanly in `--api-only` mode. The imported route sets are empty in that mode, so the cleanup itself is a noop. `--api-only` and `--addr` are mutually exclusive at the clap level (`conflicts_with`), so passing them together is rejected at parse time.
|
|
3d213d7 to
9958291
Compare
This brings main forward and updates maghemite to current main (9bb5037167c1ff0d812299f668841c9b7bda4480, including the merged PR oxidecomputer/maghemite#729 with the ddmd --api-only flag). We also bump workspace clap from 4.5 to 4.6 to satisfy the new maghemite constraint. The lockfile cascades through to align omicron-as-git refs at 915f229 too. Finally, we patch `oxlog` to the `[patch."github.com/oxidecomputer/omicron"]` list to resolve a duplicate-package error from maghemite's transitive illumos-utils -> oxlog pull.
9958291 to
d250ae7
Compare
|
@taspelund this also gets us aligned with all your work in maghemite so far. |
Awesome, thanks zeeshan! The diff looks okay to me, but I haven't come up to speed on omicron yet so I'd definitely rely on John's review from that standpoint |
jgallagher
left a comment
There was a problem hiding this comment.
LGTM - one small nit, and a question about cross-repo dependencies.
…one more mags update
|
Was holding on incorporating the latest main from @taspelund, but we we waiting on the lab-3.0-gimlet image there for falcon's CI job. @taspelund, thoughts? |
DDMD has always run in the switch zone alongside Dendrite, MGS, and MGD, but it was never registered in internal DNS, leaving no path for a cross-host consumer to discover it. This adds
ServiceName::Ddm, plumbsddm_portthrough the host-zone switch (RSS plan + reconfigurator DNS execution), threads anOverridables::ddm_portsmap for the test suite, and includes aDdmInstancetest fixture in test-utils that spawns a realddmdsubproc via--no-state-machine(matchingMgdInstance's pattern) so that the test harness registers a real DDM port in DNS the same way it does for the other switch-zone services.We also drop the duplicate DDMD_PORT const in
ddm-admin-clientin favor of the canonicalomicron_common::address::DDMD_PORT. Same-host callers continue to useClient::localhost().The legit subproc fixture depends on oxidecomputer/maghemite#729, which adds a
no-state-machineflag toddmdthat skips the kernel-related state machine and leaves only the admin API running.This was extracted from the multicast PR (zl/multicast-mgd-ddm), which uses ddmd cross-host as the first DNS-resolved consumer, as Nexus is the consumer.
References